Bioinformatics A Practical Guide to Next Generation Sequencing Data Analysis (Hamid D. Ismail)

Mapping of Sequence Reads to the Reference Genomes ◾ 61

string, $, which is also the first character of F column. The character before $ will be the

first character of L column. In general, the character which comes before can be inferred

with the pairs L and F relationships as shown in Figure 2.10. These relationships are also

illustrated in Figure 2.11 which shows that, first, we recover the first two columns of the

BWM by creating a two-column matrix from columns L and F and sorting it as shown

in Figure 2.11a. The third column is recovered by creating a three-column matrix from

column L and the first two recovered columns and then we sort it (Figure 2.11b). The same

is repeated to recover the fourth column of the sorted rotations as shown in Figure 2.11c.

We will continue like this until we recover all columns and the BWM will be as shown

in Figure 2.11d. The original string is the one ends with the character “$” in the BWM as

indicated in the shaded row in Figure 2.11.

The steps shown in Figures 2.10 and 2.11 are to show how a BWT is reversible. However,

there are several algorithms implemented by software for reversing a BWT using LF

mapping. The BWT is used as data structure for indexing a genome (compressed or

uncompressed) by transforming the entire genome into a BWT. Once a genome has been

transformed, there are several searching algorithms and approaches for finding the posi-

tion of a substring (read) on the reference genome.

FIGURE 2.10 LF mapping to reverse the BWT.

FIGURE 2.11 Using LF mapping to reverse the sorted rotations.